Skip to content

Conversation

@emkornfield
Copy link
Collaborator

@emkornfield emkornfield commented Oct 9, 2025

🥞 Stacked PR

Use this link to review incremental changes.


An optional stats field is part of the remove schema. This PR adds it here. While utility of stats can be limited it is important to propagate for remove_actions in transactions to appropriately record number of records removed on a remove action. It is likely also important to not drop this field when doing things like log_compaction.

BREAKING CHANGE: Adds a field in the middle of a the remove_file schema.

@github-actions github-actions bot added the breaking-change Change that require a major version bump label Oct 9, 2025
@emkornfield emkornfield force-pushed the stack/add_stats_to_remove branch 2 times, most recently from 08fdb5c to d959169 Compare October 9, 2025 21:41
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.96%. Comparing base (87e3552) to head (9317082).

Files with missing lines Patch % Lines
kernel/src/actions/visitors.rs 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1390      +/-   ##
==========================================
- Coverage   84.96%   84.96%   -0.01%     
==========================================
  Files         114      114              
  Lines       29445    29449       +4     
  Branches    29445    29449       +4     
==========================================
+ Hits        25017    25020       +3     
  Misses       3220     3220              
- Partials     1208     1209       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Please also run+fix the examples (inspect-table is already broken from previous schema changes that weren't incorporated, #1372)

@emkornfield emkornfield marked this pull request as ready for review October 9, 2025 23:29
@nicklan nicklan requested review from nicklan and removed request for OussamaSaoudi and zachschuermann October 14, 2025 23:17
@emkornfield emkornfield force-pushed the stack/add_stats_to_remove branch from d959169 to 05176bf Compare October 15, 2025 18:38
@emkornfield
Copy link
Collaborator Author

Please also run+fix the examples (inspect-table is already broken from previous schema changes that weren't incorporated, #1372)

Hopefully this should be covered by CI now. We will see how it does

@emkornfield emkornfield requested a review from scovich October 15, 2025 18:56
@emkornfield emkornfield force-pushed the stack/add_stats_to_remove branch from 05176bf to 7637fb9 Compare October 15, 2025 19:57
emkornfield added a commit that referenced this pull request Oct 15, 2025
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to
review incremental changes.
-
[**stack/with_data_change**](#1281)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)]
-
[stack/add_stats_to_remove](#1390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)]
-
[stack/remove_files](#1353)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)]

---------
This PR adds a with_data_change method to apply whether a transaction
represents a data change to all the file additions (and in the future
removals). This helps prevents clients from making accidentally mixing
files marked with dataChange = true or false, and is in the long term
the direction we want to go in.

In the future we will also like want to physically record dataChange at
the commit level, and move away from per file denotation.

Some implications of this change:

* The schema of add_files is now dependent on whether this flag is sent
(this is to allow connectors to maintain backwards compatibility).
* Change the default engine to no longer pass through dataChange at the
file level but instead require using the new setter.

Testing:
1.  Existing FFI tests verify backwards compatibility.
2. Write tests now call this method with the default engine and no
change of output happens.
3. An additional test is added to verify setting the flag to false,
writes the appropriate value on add actions.

BREAKING CHANGE: Engines must use with_data_change on the transaction
level instead of passing it to the method. `add_files_schema` is moved
to be scoped on a the transaction.
@emkornfield emkornfield force-pushed the stack/add_stats_to_remove branch from 7637fb9 to 9317082 Compare October 15, 2025 20:08
samansmink pushed a commit to samansmink/delta-kernel-rs that referenced this pull request Oct 19, 2025
## 🥞 Stacked PR
Use this
[link](https://github.com/delta-io/delta-kernel-rs/pull/1281/files) to
review incremental changes.
-
[**stack/with_data_change**](delta-io#1281)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1281/files)]
-
[stack/add_stats_to_remove](delta-io#1390)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1390/files/a9159f03f6260388d7bbf9f4fd2ca06f8db21f8f..7637fb90686733d81815f4fae5f9e47880f55a4f)]
-
[stack/remove_files](delta-io#1353)
[[Files
changed](https://github.com/delta-io/delta-kernel-rs/pull/1353/files/7637fb90686733d81815f4fae5f9e47880f55a4f..94e2108cbea608b98284c634807b4f265cddfa29)]

---------
This PR adds a with_data_change method to apply whether a transaction
represents a data change to all the file additions (and in the future
removals). This helps prevents clients from making accidentally mixing
files marked with dataChange = true or false, and is in the long term
the direction we want to go in.

In the future we will also like want to physically record dataChange at
the commit level, and move away from per file denotation.

Some implications of this change:

* The schema of add_files is now dependent on whether this flag is sent
(this is to allow connectors to maintain backwards compatibility).
* Change the default engine to no longer pass through dataChange at the
file level but instead require using the new setter.

Testing:
1.  Existing FFI tests verify backwards compatibility.
2. Write tests now call this method with the default engine and no
change of output happens.
3. An additional test is added to verify setting the flag to false,
writes the appropriate value on add actions.

BREAKING CHANGE: Engines must use with_data_change on the transaction
level instead of passing it to the method. `add_files_schema` is moved
to be scoped on a the transaction.
@emkornfield emkornfield changed the title [feat!] Add optional stats field to remove action feat!: Add optional stats field to remove action Oct 21, 2025
Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, but I think there's a small bug

Comment on lines 202 to 204
let base_row_id: Option<i64> = getters[12].get_opt(row_index, "remove.baseRowId")?;
let default_row_commit_version: Option<i64> =
getters[13].get_opt(row_index, "remove.defaultRowCommitVersion")?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These have to bump up too right?

Suggested change
let base_row_id: Option<i64> = getters[13].get_opt(row_index, "remove.baseRowId")?;
let default_row_commit_version: Option<i64> =
getters[14].get_opt(row_index, "remove.defaultRowCommitVersion")?;

Probably indicates we have a lack of test coverage for these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change Change that require a major version bump

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants